Minimum Error Rate Training Semiring

نویسندگان

  • Artem Sokolov
  • François Yvon
چکیده

Modern Statistical Machine Translation (SMT) systems make their decisions based on multiple information sources, which assess various aspects of the match between a source sentence and its possible translation(s). Tuning a SMT system consists in finding the right balance between these sources so as to produce the best possible output, and is usually achieved through Minimum Error Rate Training (MERT) (Och, 2003). In this paper, we recast the operations implied in MERT in the terms of operations over a specific semiring, which, in particular, enables us to derive a simple and generic implementation of MERT over word lattices.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Error Rate Training and the Convex Hull Semiring

We describe the line search used in the minimum error rate training algorithm (Och, 2003) as the “inside score” of a weighted proof forest under a semiring defined in terms of wellunderstood operations from computational geometry. This conception leads to a straightforward complexity analysis of the dynamic programming MERT algorithms of Macherey et al. (2008) and Kumar et al. (2009) and practi...

متن کامل

Optimizing Expected Word Error Rate via Sampling for Speech Recognition

State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models. It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using crossentropy (CE) or connectionist temporal classification (CTC). sMBR training optimizes the expecte...

متن کامل

Lattice-Based Minimum Error Rate Training Using Weighted Finite-State Transducers with Tropical Polynomial Weights

Minimum Error Rate Training (MERT) is a method for training the parameters of a loglinear model. One advantage of this method of training is that it can use the large number of hypotheses encoded in a translation lattice as training data. We demonstrate that the MERT line optimisation can be modelled as computing the shortest distance in a weighted finite-state transducer using a tropical polyn...

متن کامل

First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests

Many statistical translation models can be regarded as weighted logical deduction. Under this paradigm, we use weights from the expectation semiring (Eisner, 2002), to compute first-order statistics (e.g., the expected hypothesis length or feature counts) over packed forests of translations (lattices or hypergraphs). We then introduce a novel second-order expectation semiring, which computes se...

متن کامل

Improved performance and generalization of minimum classification error training for continuous speech recognition

Discriminative training of hidden Markov models (HMMs) using segmental minimum classi cation error (MCE) training has been shown to work extremely well for certain speech recognition applications. It is, however, somewhat prone to overspecialization. This study investigates various techniques which improve performance and generalization of the MCE algorithm. Improvements of up to 7% in relative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011